Jungreuthmayer etal. BMC Bioi informatics 201 3, 14:31 8 
http://www.biomedcentral.eom/1 471 -21 05/1 4/318 



METHODOLOGY ARTICLE OpenAccess 



Comparison and improvement of algorithms 
for computing minimal cut sets 

Christian Jungreuthmayer 1 ' 2 , Govind Nair 1 ' 2 , Steffen Klamt 3 and Jurgen Zanghellini 1 ' 2 * 



Abstract 

Background: Constrained minimal cut sets (cMCSs) have recently been introduced as a framework to enumerate 
minimal genetic intervention strategies for targeted optimization of metabolic networks. Two different algorithmic 
schemes (adapted Berge algorithm and binary integer programming) have been proposed to compute cMCSs from 
elementary modes. However, in their original formulation both algorithms are not fully comparable. 

Results: Here we show that by a small extension to the integer program both methods become equivalent. 
Furthermore, based on well-known preprocessing procedures for integer programming we present efficient 
preprocessing steps which can be used for both algorithms. We then benchmark the numerical performance of the 
algorithms in several realistic medium-scale metabolic models. The benchmark calculations reveal (i) that these 
preprocessing steps can lead to an enormous speed-up under both algorithms, and (ii) that the adapted Berge 
algorithm outperforms the binary integer approach. 

Conclusions: Generally, both of our new implementations are by at least one order of magnitude faster than other 
currently available implementations. 
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Background 

The aim of metabolic engineering is to (re-)allocate avail- 
able cellular resources in order to induce/stimulate cells 
to produce substances of interest. For instance, by redi- 
recting intracellular carbon fluxes, product yields can 
be boosted and optimized [1,2]. However, the identifi- 
cation of engineering targets is not straight-forward as 
cellular metabolism is a highly interconnected and regu- 
lated system of reactions. Consequently, naive interven- 
tions sometimes are ineffective or worse, adversely affect 
other, even quite distant cellular functions. To deal with 
the complex interactions in cellular metabolism and to 
identify promising engineering targets several in silico 
approaches have been developed [3-9]. Here we are par- 
ticularly concerned with two algorithms [10,11], which 
are based on elementary mode (EM) analysis [12,13] and 
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eventually compute intervention strategies as minimal 
cut sets. 

EM analysis was successfully used to identify engi- 
neering targets for the production of amino acids [14], 
biofuels [15,16], and secondary metabolites [17] in various 
organisms from C. glutamicum [14] to E. coli [15,16] 
to S. cerevisiae [18] to A. niger [19]. In fact, EM analy- 
sis is ideally suited for metabolic engineering [20,21] as 
it allows for an unambiguous and unbiased decomposi- 
tion of the analyzed network into inseparable, biologically 
meaningful steady-state pathways. Any intracellular flux 
distribution can be represented as a properly weighted 
combination of these EMs. Thus, the full set of EMs 
describes all possible steady-state functions. Conversely, 
the cells metabolic capabilities can be restricted if EMs 
are removed from the network. To remove an EM from 
the network it suffices to delete at least one contributing 
reaction [12,13]. However, as each reaction supports more 
than one EM, other network functionality will be affected, 
too. Now the question may be phrased as an optimization 
problem. The task is to find a minimal intervention strat- 
egy, which removes all unwanted functionality from the 
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network while, at the same time, keeps desirable network 
properties. 

Recently, Hadicke and Klamt [11] introduced the 
concept of constrained minimal cut sets (cMCSs) to 
predict suitable minimal intervention strategies for a 
given design criterion. They also presented an algorithm 
(adapted Berge algorithm [11]; see also [22,23]) by which 
cMCSs can be computed from EMs. Jungreuthmayer and 
Zanghellini [10] conceived an alternative method to com- 
pute cMCSs by solving a binary integer program (BIP) 
over the EMs. 

By adapting the BIP originally presented in [10] we 
first demonstrate that both algorithms deliver indeed 
equivalent results. Inspired by the theory of integer pro- 
gramming, we then develop efficient preprocessing pro- 
cedures, which allow both methods to handle problems 
with hundreds of millions of EMs. Finally, by computing 
intervention strategies in several realistic networks, we 
benchmark and compare the computational performance 
of both algorithms. 

Methods 

EMs are an unbiased way to characterize metabolic net- 
works. An EM is defined by three properties [12,13]: (i) it 
is a non-trivial, steady state flux distribution through the 
network, (ii) it obeys all thermodynamic constraints on 
reaction reversibilities, and (iii) no subset of an EM exists 
which also is an admissible flux distribution and obeys (i) 
and (ii). By this definition, an EM is a minimal, biologically 
meaningful, steady-state pathway through a network. An 
EM can be represented as a (flux) vector or by the set of 
active reactions in the EM. Herein we will mainly use the 
latter. 

In the following we assume that all EMs are known. 
Several tools to calculate EMs are freely available [24-27]. 

cMCS theory 

Hadicke and Klamt [11] defined cMCSs as solutions / of 
an intervention problem 

I = I(T,D,n). (1) 

Here, D and T denote sets of desired and target modes, 
respectively. The latter contains all EMs, which need to be 
removed from the network. The former contains all EMs 
with favorable functionality. An intervention / will be a set 
of reactions that are deleted (knocked-out) in the network. 
An EM is hit (and becomes inoperable) if at least one reac- 
tion of / is part of the EM. The variable n denotes the 
minimum number of desired EMs, which have to "survive" 
the intervention. For a given intervention /, we collect all 
the surviving desired modes in the set £>/. 

A proper solution / of equation (1) is a set of reactions 
obeying two conditions: First, the removal of the reac- 



tions in / will delete the complete target set, T, from the 
network 

tni ^0 WgT, (2) 

and no subset of / will do so. This is the MCS property. To 
be a constrained MCS the intervention / will keep at least 
n desirable EMs unaffected, i.e. 

\Dj\ > n. (3) 

As each EM represents a unique pathway through a net- 
work, removing it from the network means to block that 
path, which is easily doable by deleting at least one con- 
tributing reaction. Thus, to meet condition (2), the task 
is to find a (minimal) hitting set such that all pathways 
in T are blocked [see equation (2)]. Mathematically, this 
problem is also known as dualization of a (hyper-)graph, 
a fundamental problem in discrete mathematics [28]. Sev- 
eral algorithms for calculating hitting sets are available, of 
which the Berge algorithm [22] has been shown to per- 
form favorably for the problems considered herein [23]. 
However, minimal hitting sets ensure only that all target 
modes are hit but do not per se ensure the constraint (3), 
i.e. the survival of n desired modes. 

A simple strategy to fulfill equation (2) in combination 
with the constraint in equation (3), is to first calculate all 
possible minimal hitting sets and then, in a second step, 
to only select those solutions which also obey equation 
(3). However, the computational performance can be opti- 
mized if the constraint equation (3) is checked "on the fly", 
leading to the adapted Berge algorithm presented in [11]. 

A pseudo-code of the adapted Berge algorithm can be 
found in [11], in the following we give a small example to 
explain basic principles of the Berge algorithm. Consider 
a hypergraph with hyperedges s\ = {a, b} and 82 = {a, c] 
(in our application, s\ and 82 would represent target EMs). 
The algorithm finds first all minimal hitting sets (cut sets) 
for the first edge, i.e. y\ = {a} and Y2 = It then adds 
the next edge, 82, and checks whether the already calcu- 
lated cut sets are also cut sets for the current edge. Since 
yi is hitting 82, Yi is kept unchanged. However, Y2 is not 
a cut set for 82 and, thus, is removed from the list of cut 
sets. Instead, two new cut sets are created by individu- 
ally adding each element of 82 to Y2> i-e. 73 = {b,a} and 
y 4 = {b, c}. To guarantee minimality the algorithm checks 
if a newly created cut set is a superset of an already exist- 
ing one. That is, Y3 gets removed from the set of cut sets as 
it is a superset of 71. Next, a new edge is added to the sys- 
tem and the calculation cycle starts over. Execution stops 
when all hyperedges are processed. To account for the 
intervention problem and accelerate the classic algorithm, 
Hadicke and Klamt suggested to first check if a newly gen- 
erated cut set is consistent with the constraint (3) and only 
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then check its minimality against all previously calculated 
cut sets [11]. This modification leads to the adapted Berge 
algorithm [11] which will be used in the following. 

cMCSs can be formulated as a BIP 

In a recent paper [10] we showed that if \D\ = n then the 
intervention / = 1(1 \D, n) is representable as a BIP. How- 
ever, even the general problem that at least n out of \D\ 
modes need to survive the intervention can be formulated 
as a BIP. 

Let e be an EM of a metabolic network with m reac- 
tions, fulfilling the steady state condition, and b = b(e) its 
binary representation, 



V := b\e l ) 



1 He 1 ^ 0 
0 if ^ = 0 



i = {l t ... f m} 



(4) 



b l indicates whether reaction i is part of the EM e. 
A solution x to equation (1) can be found by solving the 
following BIP: 



max ||#|| 
s.t.b T d x> \\b d \\f, 
b T d X< ||*rf||(l 

bjx<\\b t \\-l, 
\\y\\ > n, 

X — (x ■)...■) oc ) , 

y=(y\...,yW)\ 



de {!,.., \D\}, 

te{l,..,|r|}, 

x l e {0, l}Vi, 
/ 6 {0, l}Vi. 



(5a) 
(5b) 
(5c) 
(5d) 
(5e) 
(5f) 
(5g) 



Here we used the indices d and £ as a reminder that the 
EM vectors, bu are elements of the sets D and T, respec- 
tively. The solution vector, x, is the binary representation 
of a single cMCS, where x l = 0 marks reactions which 
get deleted, while x l = 1 stands for reactions that remain 
unaffected by the genetic intervention. The elements of 
the binary, auxiliary vector, y, indicate whether or not a 
desirable mode survives the intervention (1 and 0, respec- 
tively). Note that our notation uses superscripts to denote 
coordinates of vectors and subscripts to denote different 
vectors. Finally, ||*|| := J^ILi %l represents the multilinear 
norm of x. 

Suppose y d = 1, then equation (5c) always holds and 
can be omitted. Equation (5b) requires that x l > b l d , 
V* G {1, . . . , m). Only then the product of b T d and x is 
equal to the norm of b^ Thus, bd is included in the final 
design. In contrast to bj, b t will be removed from the net- 
work as equation (5d) requires that the product bjx is 
smaller than the \ \b t \\. This is the case only if at least one 
reaction in b t is deleted. Except for equation (5e), the sys- 
tems of equations in this case resembles the BIP problem 
presented in Jungreuthmayer and Zanghellini [10]. 



If y d = 0, then equation (5b) is ineffective. Equation (5c) 
however simplifies into a "kill constraint", thus eliminating 
bd from the surviving modes. 

The binary auxiliary variables y = (y 1 , . . . ,3/ |jD ') T were 
introduced to guarantee that at least n out of \D\ modes 
survive the intervention. In both cases \\y\\ counts the 
number of surviving desired modes, and equation (5e) 
makes sure that at least n desired modes survive the 
intervention. 

Alternative MCSs may be calculated by excluding exist- 
ing solutions Xj by adding the following constraints [10] to 
the set of equations (5a) -(5g): 



xjx < \\Xj\ 

T , 



1, 



[ 1 - Xj] X > 1, 



(6a) 
(6b) 



where we used 1 to denote an all-one-vector. Equation (6a) 
guarantees that new solutions are found in subsequent 
steps, whereas equation (6b) prevents the calculation of 
solutions that are supersets of already existing solutions. 
Note that the term 1 — Xj represents the binary comple- 
ment Of Xj. 

The number of constraints added to the BIP can almost 
be cut in half (in fact, n/2 — 1) by keeping in mind that the 
norm of the current solution Xj will never be larger than 
the previous optimum Xj-\. To sequentially calculate all 
MCSs the full BIP reads 



max | \xj 1 1 
*X.b T d Xj> \\b d \\y\ 

b^xj<\\bd\\(l+y d )-h 
bfxj>\\b t \\-h 
\\xjW < I ll> 

[l—Xk] T Xj > 1, 

IWI 

Xj = (Xj , . . . , Xj ) , 



y = (y >■ 



(7a) 

d e {1, .., \D\], (7b) 
(7c) 

f€{l,..,|2U (7d) 
\\xo\\=m, (7e) 
/cg{0,..,;-1}, (7f) 
(7g) 

x l e {0, 1}VZ, (7h) 
/' G {0, 1}V/. (7i) 



If iteratively applied, the BIP in equation (7) returns all 
MCSs, Xj, sorted in increasing order of deletions. Note 
that although the constraint in equation (7e) is redundant, 
it significantly enhances the computational performance 
of the BIP solver. 

Preprocessing methods 

Mathematically, BIPs are classified as NP-hard problems. 
However, extensive research has focused on improving the 
formulation of BIPs. The basic idea is to use simple logic 
rules which turn a BIP into a "better" formulation, which 
is easier to solve [29]. Standard BIP preprocessing rules 
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essentially fix variables, improve bounds or detect inactive 
constraints [29]. 

In the following we will be concerned with standard BIP 
preprocessing methods to reduce the size of the interven- 
tion problem in equation (1) but not with the internal 
structure of the algorithms. These preprocessing proce- 
dures will allow to reduce the size of the intervention 
problem in equation (1), which can then be solved by the 
Berge algorithm or a BIP. In the following, by "Berge algo- 
rithm" we mean the adapted Berge algorithm reported by 
Hadicke and Klamt [11] which extends the standard Berge 
algorithm to compute only minimal hitting sets (cut sets) 
that comply with the constraint (3) on the desired modes 

in]. 

We assume that all EMs are converted to their binary 
representation according to equation (4). Furthermore, we 
split the complete set of EMs in three sets, D, N, and T. 
Here the neutral set, N, contains all (binary) EMs, which 
are neither element of D nor T. 

Step L First, we remove all reactions that are simultane- 
ously zero in all EMs of T. These reactions do not support 
any EM in T. Deleting them will not remove any unwanted 
mode. 

Step 2. Next, essential reactions are identified. If delet- 
ing a reaction reduces the number of surviving modes in D 
to less than n [i.e. violates equation (3)], then this reaction 
is considered to be essential and cannot be knocked out. 
A reaction i is essential if \D\ — s l < n, with s l = Yl]=i x )- 

Consider the example in Table 1 with \D\ = 5 and n = 3. 
Rl and R7 are essential reactions, as for them \D\ — s l = 
5 — 3 = 2<3 = n, which indicates that knocking out 
Rl or R7 will kill more desirable modes than allowed. We 
note that if \D\ = n, all active reactions are essential. 

In general, the more essential reactions we find, the 
more the system can be reduced. Consequently, it is ben- 
eficial if n is large (ideally n = \D\), as this results in 
the maximum number of essential reactions. Removing all 
essential reactions from the system is a critical step that 
opens the possibility to apply several other preprocessing 
procedures. 

The removal of all essential reactions results in an 
important change of the system. By definition EMs are 



Table 1 Example of determining essential reactions 
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non-decomposable, i.e. an EM is not a subset of any other 
EM. However, if the essential reactions are removed then 
the residual EMs may become subsets or duplicates of 
other modes. Hence, the next step is to find all duplicate 
modes in T and to remove them from the system. 

Step 3. Next, we screen T to find and remove resid- 
ual EMs that are supersets of other residual EMs in T. 
Consider the target set (of residual EMs), T, shown in 
Table 2. The modes are sorted in order of ascending norm. 
The example illustrates that mode t\ can be removed by 
knocking out reaction R2. However, knocking out reaction 
R2 also kills t<i, as is a superset of t\. 

The same procedure can be applied to the other modes 
as well. Mode £3 has a norm of 2 and is killed either by 
knocking out reaction R4 or reaction R7. As both reac- 
tions are part of £4 and £5, they are certainly removed if 
mode £3 is killed. 

Step 4. In a final preprocessing step, we remove dupli- 
cate reactions across all EMs in both sets, D and J 7 . Using 
the illustration in Table 2, this would mean that we remove 
duplicate columns. Note that this step is most effective 
after all supersets were removed. For instance, in Table 2 
columns Rl, R3 and R8 are identical only if ti, and £5 
are removed. In this step it is not possible to analyze D and 
T separately. Reactions need to be identical in both sets, 
D and T, in order to be removed. 

Implementation 

We implemented the BIP algorithm in C using Gurobi 
Optimizer 5.0, http://www.gurobi.com/ for solving the 
BIP problem. The adapted Berge algorithm was imple- 
mented in C. The software is available from the authors on 
request. The simulations were all carried out on an Intel 
Xeon CPU X5650 @ 2.67GHz under a Linux operating 
system. 

Test cases 

We used the E. coli core model, EO, [30] and two smaller 
models, El and E2, which were derived from the EO model 



Table 2 Set of (residual) target modes before and after 
subset-superset elimination 
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Modes which are removed during the preprocessing, are marked by *. Note that 
the residual target modes t], .. .,t 6 are no longer EMs, as they have already gone 
through step 1 and 2 of our preprocessing procedure. 
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Figure 1 Overview of the different E coli models. For simplicity we only show pathways. Cofactors etc. are suppressed. Metabolites contributing 
to the biomass are depicted in yellow. Pathways included in the E2 model are indicated in red. E1 contains the red and blue pathways only, while EO 
[30] incloses all reactions, including the non-colored pathways. A detailed listing of all models may be found in the Additional file 1 . 
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Table 3 Topological properties of the E. coli models used 
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by removing several reactions. Compared to the EO model, 
glucose was considered as the only carbon substrate for 
El and exchange of a-ketoglutarate, acetaldehyde, acetate, 
formate, lactate, and pyruvate was not allowed. In addi- 
tion to these modifications we also removed the glyoxy- 
late shunt and the (NAD and NADP dependent) malic 
enzymes to obtain the E2 model from El. All three models 
are illustrated in Figure 1. Their main topological proper- 
ties are summarized in Table 3. A list of reactions for these 
models may be found in the (Additional file 1). 

To test the numerical efficiency of the implemented 
MCS algorithms we set up the following intervention 
problems: We first identify the most efficient EMs in 
all models. Efficiency is defined as the product between 
growth rate and ethanol secretion. Next, we classify all 
EMs to be desirable, whose ethanol secretion is larger or 
equal than the excretion of the most efficient EMs. Tar- 
gets are all other modes that do not utilize ethanol. Modes 
which take up ethanol (negative secretion) are considered 
neutral, as ethanol uptake is repressed in the presence 



of glucose in the growth medium [31]. Therefore these 
modes do not need to be targeted. In Figure 2 we illustrate 
the intervention problem and the choice of A N, and T for 
the E2 model. The major properties of the design criteria 
for the different E. coli models are listed in Table 4. 

Results 

Berge algorithm outperforms the BIP 

Figure 3 shows the computation time to calculate all 
MCSs using either method as a function of the mini- 
mally required number n of surviving desired EMs. We 
used the design criteria outlined above. At n = 2 we 
found 81,168 and 441,095 MCSs in E2 and El, respec- 
tively. (The number of MCSs as function of n may be 
found in Additional file 2: Figure SI.) In all tested situa- 
tions the adapted Berge algorithm clearly outperforms the 
BIP. Even in the most demanding case (n = 2), the Berge 
algorithm calculated all MCSs in El in less than 10 min. 
On the other hand, it already took the BIP 22 hours to cal- 
culate all 331 MCSs for n = 85 in the smaller E2 model. In 
the same situation the Berge implementation finished in 
0.4 sec. 

It is interesting to observe that over a wide range of val- 
ues for n the runtime in both methods changes according 
to a power law (see Figure 3). However, only for the Berge 
algorithm the exponent remained approximately constant 
in both test cases. 

Preprocessing-times are essentially independent of n 
and only scale with the total number of processed EMs. 
For cases with very few MCSs (see Additional file 2: 
Figure SI) the Berge algorithm took even less time than 
the preprocessing. 




0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 
growth / C-flux in 



0.016 0.018 



Figure 2 Phenotypic plot of all EMs in E2. Flux values are normalized to the total carbon influx (C-fluXj n ). EMs are color coded with respect 
to the modes' efficiency, defined as the product between normalized growth and normalized ethanol secretion (etoh ou t)- The most efficient EM is 
marked by an arrow. The areas D, N, and T indicate the corresponding sets of EMs for the intervention problem l n (T,D,n) with n e {1, . . . ,n max }. 
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Table 4 Cardinalities for the sets involved in the 
intervention problem I n (T,D,n) with n € {1, ...,«max} 



E2 



El 



\D\ 


5,132 


(9%) 


46,254 


(10%) 


\N\ 


18,447 


(33%) 


217,877 


(45%) 


\T\ 


32,087 


(58%) 


221,038 


(45%) 


^max 


1,120 


(2%) 


11,436 


(2%) 



n max denotes the maximum number of "surviving" modes for a given set of T 
and D. That is, for n > n max the intervention l n is infeasible. Numbers in brackets 
give the cardinality in percent of the total number of EMs. 



Preprocessing reduces overall computation time 

To test the impact of our preprocessing procedures we set 
up identical intervention problems for all models. That 
is, we solved 1$ = I(TQ>D,n)> I\ = liTx.D.n), and 
h = I(T2,D ) n), where we used the indices 0, 1, and 2 
to denote the dependence on the models E0, El, and E2, 
respectively. We used identical sets of desired EMs in all 
models, i.e. Dq = D\ = D 2 = D and no = n\ — n<i = n. 
Tu i e {0, 1, 2} consisted of all EMs not contained in £>. 
Values for £>, n, T{ and the runtimes for the Berge algo- 
rithm in two different cases (n/\D\ ^ 1 and n/\D\ <^ 1) 
may be found in Table 5. 

In the most demanding case (n/\D\ 1), the Berge 
algorithm with preprocessing identified 1,720 MCSs in 
less than 30 minutes in the large E0 model with its 124 
million EMs (see Table 5). Only 1% of the computation 



time was used for the Berge algorithm. Ninety-four per- 
cent of the computation is spent on preprocessing. After 
preprocessing the initial system of 124 million EMs was 
reduced to approximately 300,000 modes. In all tested 
cases with enhanced preprocessing, reading EMs and pre- 
processing took at least 90% of the total computation time. 
We repeated the same simulation without preprocessing. 
While the total runtime with and without preprocessing 
is comparable if only a few MCSs are found, the runtime 
savings in MCS calculation more than outweigh the run- 
time losses due to preprocessing if many MCSs solve an 
intervention problem. To emphasize this point we show 
the total runtime as function of the number of MCSs for 
Figure 3 in the Additional file 2: Figure SI. 

Finally in Table 6 we show several examples from the lit- 
erature, which can be easily and efficiently solved by either 
method. As a comparison we have also listed runtimes 
using the current version (version 2012.1) of CellNet Ana- 
lyzer (CNA) [32]. CNA uses a MATLAB implementation 
of the Berge algorithm. However, its preprocessing capa- 
bilities are less developed. That is why both programs, our 
Berge-algorithm and BIP, outperform CNA in all instances 
by at least one order of magnitude. Note however that 
CNA uses a MATLAB script, while our programs are 
implemented in C. A significant part of the performance 
difference may therefore be attributed to the slower per- 
formance of MATLAB compared to native executables 
written in C. 
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Figure 3 Runtime for the Berge algorithm (left panel) and the BIP (right panel) for the models E2 (red) and El (blue), respectively. The 

runtimes for preprocessing (circles) and calculating MCSs are plotted as function of n. n is the lower bound of desired EMs, which must survive the 
intervention. Table 4 gives the size of the intervention problem for both models. As the trend is clearly visible, we did not evaluate the BIP for n < 8C 



Table 5 Runtime of the Berge algorithm with and without preprocessing (PP) for two intervention problems I(T, D, n) with differing n 



E2 E1 EO 



\D\ 


489 


(0.88%) 




489 




(0.10%) 




489 


(0.00%) 


\T\ 


55,177 


(99.12%) 




484,680 




(99.90 


%) 




1 24,340,727 


(100.00%) 










m = 400; m/|D| 


= 0.82 












MCSs 


4 








11 








44 




Min. deletions 


5 








8 








16 




Max. deletions 


5 








8 








17 






Without PP 


With PP 


Without PP 






With PP 


Without PP 


With PP 


System size 


60 x 55,177 


13 x 


1 


64 x 484,680 






17x3 


75 x 1 24, 340, 727 


28 x 6 


Reading EMs (sec) 


0.031 (26%) 


0.035 


(43%) 


0.281 (26%) 




0.308 


(37%) 


71.858 


(24%) 


76.274 (26%) 


Preprocessing (sec) 


0.078 (65%) 


0.044 


(55%) 


0.717 (65%) 




0.495 


(60%) 


194.328 


(64%) 


202.539 (69%) 


Calculate MCSs (sec) 


0.011 (9%) 


0.001 


(2%) 


0.100 (9%) 




0.021 


(3%) 


35.866 


(12%) 


15.294 (5%) 


Total (sec) 


0.121 (100%) 


0.081 


(100%) 


1.099 (100%) 




0.823 


(100%) 


302.053 


(100%) 


294.108 (100%) 










n 2 =40; n 2 /\D\-. 


= 0.08 












MCSs 


72 








274 








1,720 




Min. deletions 


5 








6 








14 




Max. deletions 


7 








10 








19 






Without PP 


With PP 


Without PP 






With PP 


Without PP 


With PP 


System size 


60 x 55,177 


47 x 2, 295 


64 x 484,680 




51 x 8,664 


75 x 1 24, 340, 727 


62 x 321,272 


Reading EMs (sec) 


0.031 (10%) 


0.033 


(25%) 


0.277 (7%) 




0.308 


(20%) 


71.084 


(3%) 


77.771 (5%) 


Preprocessing (sec) 


0.078 (26%) 


0.083 


(65%) 


0.715 (17%) 




1.162 


(76%) 


193.805 


(7%) 


1,493.367 (94%) 


Calculate MCSs (sec) 


0.196 (64%) 


0.012 


(10%) 


3.136 (76%) 




0.068 


(4%) 


2,475.110 


(90%) 


15.909 (1%) 


Total (sec) 


0.306 (100%) 


0.128 


(100%) 


4.129 (100%) 




1.539 


(100%) 


2,740.004 


(100%) 


1,587.053 (100%) 


In all cases D are identical and T chosen such that it contains all remaining EMs. Note that the columns "without PP" state the runtimes without performing step 1 to 4 of our PP procedures. However, we still sort EMs in 
ascending order of norm. This is why PP-time is not zero even in the cases without PP. The row "system size" refers to the dimensions of the network, which enters the Berge-algorithm, i.e. after PP. 
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Table 6 Runtime analysis for the Berge algorithm and BIP using several examples from the literature with different 
design objectives 

Runtime (sec) 



Organism 


Objective 


\D\ 


in 


#MCS 


Min. A 


Max. A 


Berge 


BIP 


CNA 


£ coli [16] (anaerobic) 


ethanol 


12 


4,998 


1,048 


6 


9 


0.011 


0.287 


2.83 


E coll [16] (aerobic) 


ethanol 


12 


429,264 


55,488 


11 


15 


0.883 


2.174 


547.61 


E coll [15] (anaerobic) 


isobutanol 


7 


5,615 


760 


7 


10 


0.011 


0.233 


2.69 


E coll [33] (anaerobic) 


n-butanol 


7 


7,341 


2,280 


7 


10 


0.015 


0.226 


3.43 



Both algorithms use all preprocessing procedures. For comparison we also use the program package CellNetAnolyzer [32] which uses a MATLAB script of the Berge 
algorithm. (Abbreviations: #MCS, number of MCS; min. A, minimal number of deletions; max. A, maximal number of deletions; CNA, CellNetAnalyzer). 



Preprocessing strongly reduces the system size 

In Figure 4 we used a BIP and show the computation time 
as a function of the number of MCSs for the aerobic E. 
coli [33] model of Trinh et al. [16] (see line number 2 in 
Table 6 for model details). Note that although Table 6 lists 
55,488 different MCSs, the BIP (and our Berge algorithm 
for that matter) only needs to calculate 164 solutions. Due 
to preprocessing the original network is reduced from 71 
reactions and 429,275 EMs to an equivalent system with 
only 23 columns and 28 rows. This smaller system has 
164 MCSs, which are then reconstructed to the full set of 
MCSs by expanding duplicated columns. A similar obser- 
vation may also be made in Table 5. In these examples the 



system size is at least reduced by a factor of 30 (case E2, 
n 2 = 40). 

Surprisingly, the computation time does not monoton- 
ically increase with the number of solutions [i.e. with the 
number of additional constraints, see equation (7)] but 
drops dramatically whenever the norm of the solution 
decreases. Note that a decreasing norm means an increase 
in the number of required knockouts. In this model the 
computation time significantly drops after solution num- 
ber 11, 53, 116, and 156. At these instances the number 
of required deletions changes from 11 to 12, to 13, to 14, 
and to 15, respectively. In all these cases the constraint 
in equation (7e) decreases too and introduces a tighter 



Solution number 




Solution number 

Figure 4 Computation time of the BIP as function of the number of MCSs for the aerobic E. coli [33] model of Trinh et al. [1 6] (line number 
2 in Table 6). The accumulated computation time for the respective models is plotted in the lower panel. The top panel shows the computation 
time for each solution. Note that it is impossible to calculate the runtime per solution for the Berge algorithm as the algorithm does not allow to 
continuously output the solution. 
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bound on the system. This allows the solvers internal pre- 
processing to more efficiently compress the system, which 
in turn brings down the computation time. If, however, 
the norm of the solution does not change then the com- 
putation time scales approximately exponentially with the 
number of MCSs. This behavior is expected as each solu- 
tion adds a new constrain to the system, which makes it 
harder to solve. 

Discussion 

Recently cMCSs have been introduced to predict opti- 
mal intervention strategies in order to achieve an arbitrary 
metabolic objective [11]. Two algorithmic approaches 
have been published for their calculation [10,11]. Here we 
showed that both methods are equivalent. We addressed 
the numerical efficiency of both methods in typical design 
problems and found that in terms of runtime the Berge 
algorithm is superior compared to BIP. 

It may appear as a surprise that the Berge algorithm 
performs so well even for the large cases presented in 
this study, especially since the Berge method is known 
for its unfavorable performance in huge networks [34]. 
However, here we showed that efficient preprocessing can 
dramatically reduce the size of the networks. The adapted 
Berge algorithm could then be run on the reduced sys- 
tems. Apparently, for small systems the Berge algorithm is 
effective. 

The importance of preprocessing in the calculation of 
MCSs has been stressed earlier [23]. The preprocessing 
strategies introduced herein focus especially on the addi- 
tional constraints posed by cMCSs, whereas [23] dealt 
only with (unconstrained) MCSs. We were able to show 
that our implementation outperforms the currently avail- 
able tool for computing (c)MCSs (see Table 6). The per- 
formance gain can be attributed to both the improved 
preprocessing and the efficient implementation in C. 
Herein we used standard preprocessing routines, which 
are frequently applied in BIP [29]. Extensive literature 
on preprocessing in binary and integer programming is 
available, see for instance Savelsbergh [35] for a good sum- 
mary of basic ideas. Since cMCSs can be stated as a BIP, 
these methods are readily adoptable. However, due to the 
algorithmic complexity of BIP (solving numerous linear 
optimizations as part of one BIP, etc.), a full enumeration 
based on BIP seems not be competitive compared to the 
Berge algorithm (see Figure 3). Rather the usage of BIP 
preprocessing rules followed by the Berge algorithm to 
calculate cMCSs is suggested as an optimal computation 
strategy. 

The efficiency of preprocessing is dependent on the 
imposed design criterion. In the worst case the set of 
desired modes is empty (D = 0) and T contains all EMs 
of a network. This situation corresponds to unconstrained 
MCSs and thus to a full dualization of the hypergraph 



spanned by the target modes. Except for step 4, none of 
our preprocessing routines then provides an advantage 
and other solvers may be more appropriate [34] . However 
such cases are not relevant in the context of metabolic 
engineering, where we want to optimize favorable func- 
tionality. To fully utilize the potential of preprocessing the 
ratio n/\D\ should be close to one. This means that many 
essential reactions will be removed from the system, and 
as a result of that many supersets will be detected, too. 
However, in practice it may suffice if only a few EMs out 
of the set of desired modes survive, i.e. n/\D\ <C 1. Still, 
preprocessing provides a significant performance gain as 
indicated in Table 5. The runtime costs of preprocess- 
ing will be outweighed by the savings in MCS calculation, 
if the intervention problem has many solutions. In prac- 
tice, preprocessing will therefore be favorable, as typical 
applications have a few thousand solutions (see Table 6). 

In our paper [10] we used weights in the objective func- 
tion of the BIP to take experimental difficulties in the 
deletion of reactions into account. For instance, some 
reactions cannot be deleted as they are driven by diffu- 
sion, rather than catalyzed by an enzyme. Other reactions, 
on the other hand, may require the deletion of multiple 
genes as they are catalyzed by different enzymes in paral- 
lel. By an appropriate choice of the weights in the objective 
function BIP is able to predict the experimentally easiest 
deletion strategies first [10]. However, in the preprocess- 
ing procedures above we did not consider weights in the 
objective function. Identifying particular solutions in the 
complete list of MCSs has to be done in a separate post- 
processing step (for example by appropriately sorting the 
output, which can be done quite fast). Thus even with 
an additional post-processing step our implementation of 
Berge s algorithm will be faster than BIP. Note however 
that the integration of regulatory information into the 
cMCS framework is a unique feature of the BIP approach 
[10]. 

Both methods, the Berge algorithm as well as BIP, still 
show room for computational improvements. In the case 
of the Berge algorithm the computational bottleneck sits 
in the filtering of potential MCSs to determine if they 
are, in fact, true MCSs and not supersets of true MCSs 
[23]. Generating new MCS -candidates, however, is very 
quick. Therefore ways of enhancing the superset-filtering 
procedure will be the scope of future work. 

One disadvantage of the Berge algorithm is its inability 
to predict MCSs continuously during the runtime. During 
execution all MCSs remain candidate-MCSs. Only upon 
termination, when the minimality of all candidate MCSs 
has been checked against each other, candidate-MCSs 
become MCSs and can be outputted. Thus, even if we 
were interested in only one solution, the Berge algorithm 
will - in general - return more than one MCS upon ter- 
mination. However, other solvers are available [34] . Their 
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adaption for the current situation is the scope of further 
work. 

BIP on the other hand, is able to predict a single solu- 
tion without the need to enumerate all In fact, due to the 
optimization principle only one MCS with a smallest or 
largest number of deletions can be calculated. In Figure 4 
we illustrated the runtime per solution as function of 
the number of MCSs. The drop in runtime after cer- 
tain solutions indicates that more advanced preprocessing 
procedures may further reduce the runtime significantly. 
In fact, our preprocessing focused on standard procedures 
like variable fixing. More advanced methods will further 
reduce the runtime for both the Berge algorithm and the 
BIP. Additionally we used GUROBI, a commercially avail- 
able multi-purpose optimization toolbox, to solve the BIP. 
However, a specialized knapsack solver may potentially 
boost the performance. 

Conclusions 

We predicted minimal metabolic intervention strate- 
gies in typical metabolic engineering problems using 
two different methods (an adapted Berge algorithm and 
a BIP). We investigated the numerical performance of 
these approaches. Both methods significantly profited 
from the enhanced preprocessing procedures developed 
here. Under the tested conditions, our implementation 
of Berge s algorithm performed best even outperforming 
other, currently available software. 
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